136 research outputs found

    Els recursos hídrics del riu Sénia

    Get PDF

    Acceleration-as-a-Service: Exploiting Virtualised GPUs for a Financial Application

    Get PDF
    'How can GPU acceleration be obtained as a service in a cluster?' This question has become increasingly significant due to the inefficiency of installing GPUs on all nodes of a cluster. The research reported in this paper is motivated to address the above question by employing rCUDA (remote CUDA), a framework that facilitates Acceleration-as-a-Service (AaaS), such that the nodes of a cluster can request the acceleration of a set of remote GPUs on demand. The rCUDA framework exploits virtualisation and ensures that multiple nodes can share the same GPU. In this paper we test the feasibility of the rCUDA framework on a real-world application employed in the financial risk industry that can benefit from AaaS in the production setting. The results confirm the feasibility of rCUDA and highlight that rCUDA achieves similar performance compared to CUDA, provides consistent results, and more importantly, allows for a single application to benefit from all the GPUs available in the cluster without loosing efficiency.Comment: 11th IEEE International Conference on eScience (IEEE eScience) - Munich, Germany, 201

    Multi-Tenant Virtual GPUs for Optimising Performance of a Financial Risk Application

    Get PDF
    Graphics Processing Units (GPUs) are becoming popular accelerators in modern High-Performance Computing (HPC) clusters. Installing GPUs on each node of the cluster is not efficient resulting in high costs and power consumption as well as underutilisation of the accelerator. The research reported in this paper is motivated towards the use of few physical GPUs by providing cluster nodes access to remote GPUs on-demand for a financial risk application. We hypothesise that sharing GPUs between several nodes, referred to as multi-tenancy, reduces the execution time and energy consumed by an application. Two data transfer modes between the CPU and the GPUs, namely concurrent and sequential, are explored. The key result from the experiments is that multi-tenancy with few physical GPUs using sequential data transfers lowers the execution time and the energy consumed, thereby improving the overall performance of the application.Comment: Accepted to the Journal of Parallel and Distributed Computing (JPDC), 10 June 201

    Improving Performance and Energy Efficiency of Heterogeneous Systems with rCUDA

    Full text link
    Tesis por compendio[ES] En la última década la utilización de la GPGPU (General Purpose computing in Graphics Processing Units; Computación de Propósito General en Unidades de Procesamiento Gráfico) se ha vuelto tremendamente popular en los centros de datos de todo el mundo. Las GPUs (Graphics Processing Units; Unidades de Procesamiento Gráfico) se han establecido como elementos aceleradores de cómputo que son usados junto a las CPUs formando sistemas heterogéneos. La naturaleza masivamente paralela de las GPUs, destinadas tradicionalmente al cómputo de gráficos, permite realizar operaciones numéricas con matrices de datos a gran velocidad debido al gran número de núcleos que integran y al gran ancho de banda de acceso a memoria que poseen. En consecuencia, aplicaciones de todo tipo de campos, tales como química, física, ingeniería, inteligencia artificial, ciencia de materiales, etc. que presentan este tipo de patrones de cómputo se ven beneficiadas, reduciendo drásticamente su tiempo de ejecución. En general, el uso de la aceleración del cómputo en GPUs ha significado un paso adelante y una revolución. Sin embargo, no está exento de problemas, tales como problemas de eficiencia energética, baja utilización de las GPUs, altos costes de adquisición y mantenimiento, etc. En esta tesis pretendemos analizar las principales carencias que presentan estos sistemas heterogéneos y proponer soluciones basadas en el uso de la virtualización remota de GPUs. Para ello hemos utilizado la herramienta rCUDA, desarrollada en la Universitat Politècnica de València, ya que multitud de publicaciones la avalan como el framework de virtualización remota de GPUs más avanzado de la actualidad. Los resutados obtenidos en esta tesis muestran que el uso de rCUDA en entornos de Cloud Computing incrementa el grado de libertad del sistema, ya que permite crear instancias virtuales de las GPUs físicas totalmente a medida de las necesidades de cada una de las máquinas virtuales. En entornos HPC (High Performance Computing; Computación de Altas Prestaciones), rCUDA también proporciona un mayor grado de flexibilidad de uso de las GPUs de todo el clúster de cómputo, ya que permite desacoplar totalmente la parte CPU de la parte GPU de las aplicaciones. Además, las GPUs pueden estar en cualquier nodo del clúster, independientemente del nodo en el que se está ejecutando la parte CPU de la aplicación. En general, tanto para Cloud Computing como en el caso de HPC, este mayor grado de flexibilidad se traduce en un aumento hasta 2x de la productividad de todo el sistema al mismo tiempo que se reduce el consumo energético en un 15%. Finalmente, también hemos desarrollado un mecanismo de migración de trabajos de la parte GPU de las aplicaciones que ha sido integrado dentro del framework rCUDA. Este mecanismo de migración ha sido evaluado y los resultados muestran claramente que, a cambio de una pequeña sobrecarga, alrededor de 400 milisegundos, en el tiempo de ejecución de las aplicaciones, es una potente herramienta con la que, de nuevo, aumentar la productividad y reducir el gasto energético del sistema. En resumen, en esta tesis se analizan los principales problemas derivados del uso de las GPUs como aceleradores de cómputo, tanto en entornos HPC como de Cloud Computing, y se demuestra cómo a través del uso del framework rCUDA, estos problemas pueden solucionarse. Además se desarrolla un potente mecanismo de migración de trabajos GPU, que integrado dentro del framework rCUDA, se convierte en una herramienta clave para los futuros planificadores de trabajos en clusters heterogéneos.[CA] En l'última dècada la utilització de la GPGPU(General Purpose computing in Graphics Processing Units; Computació de Propòsit General en Unitats de Processament Gràfic) s'ha tornat extremadament popular en els centres de dades de tot el món. Les GPUs (Graphics Processing Units; Unitats de Processament Gràfic) s'han establert com a elements acceleradors de còmput que s'utilitzen al costat de les CPUs formant sistemes heterogenis. La naturalesa massivament paral·lela de les GPUs, destinades tradicionalment al còmput de gràfics, permet realitzar operacions numèriques amb matrius de dades a gran velocitat degut al gran nombre de nuclis que integren i al gran ample de banda d'accés a memòria que posseeixen. En conseqüència, les aplicacions de tot tipus de camps, com ara química, física, enginyeria, intel·ligència artificial, ciència de materials, etc. que presenten aquest tipus de patrons de còmput es veuen beneficiades reduint dràsticament el seu temps d'execució. En general, l'ús de l'acceleració del còmput en GPUs ha significat un pas endavant i una revolució, però no està exempt de problemes, com ara poden ser problemes d'eficiència energètica, baixa utilització de les GPUs, alts costos d'adquisició i manteniment, etc. En aquesta tesi pretenem analitzar les principals mancances que presenten aquests sistemes heterogenis i proposar solucions basades en l'ús de la virtualització remota de GPUs. Per a això hem utilitzat l'eina rCUDA, desenvolupada a la Universitat Politècnica de València, ja que multitud de publicacions l'avalen com el framework de virtualització remota de GPUs més avançat de l'actualitat. Els resultats obtinguts en aquesta tesi mostren que l'ús de rCUDA en entorns de Cloud Computing incrementa el grau de llibertat del sistema, ja que permet crear instàncies virtuals de les GPUs físiques totalment a mida de les necessitats de cadascuna de les màquines virtuals. En entorns HPC (High Performance Computing; Computació d'Altes Prestacions), rCUDA també proporciona un major grau de flexibilitat en l'ús de les GPUs de tot el clúster de còmput, ja que permet desacoblar totalment la part CPU de la part GPU de les aplicacions. A més, les GPUs poden estar en qualsevol node del clúster, sense importar el node en el qual s'està executant la part CPU de l'aplicació. En general, tant per a Cloud Computing com en el cas del HPC, aquest major grau de flexibilitat es tradueix en un augment fins 2x de la productivitat de tot el sistema al mateix temps que es redueix el consum energètic en aproximadament un 15%. Finalment, també hem desenvolupat un mecanisme de migració de treballs de la part GPU de les aplicacions que ha estat integrat dins del framework rCUDA. Aquest mecanisme de migració ha estat avaluat i els resultats mostren clarament que, a canvi d'una petita sobrecàrrega, al voltant de 400 mil·lisegons, en el temps d'execució de les aplicacions, és una potent eina amb la qual, de nou, augmentar la productivitat i reduir la despesa energètica de sistema. En resum, en aquesta tesi s'analitzen els principals problemes derivats de l'ús de les GPUs com acceleradors de còmput, tant en entorns HPC com de Cloud Computing, i es demostra com a través de l'ús del framework rCUDA, aquests problemes poden solucionar-se. A més es desenvolupa un potent mecanisme de migració de treballs GPU, que integrat dins del framework rCUDA, esdevé una eina clau per als futurs planificadors de treballs en clústers heterogenis.[EN] In the last decade the use of GPGPU (General Purpose computing in Graphics Processing Units) has become extremely popular in data centers around the world. GPUs (Graphics Processing Units) have been established as computational accelerators that are used alongside CPUs to form heterogeneous systems. The massively parallel nature of GPUs, traditionally intended for graphics computing, allows to perform numerical operations with data arrays at high speed. This is achieved thanks to the large number of cores GPUs integrate and the large bandwidth of memory access. Consequently, applications of all kinds of fields, such as chemistry, physics, engineering, artificial intelligence, materials science, and so on, presenting this type of computational patterns are benefited by drastically reducing their execution time. In general, the use of computing acceleration provided by GPUs has meant a step forward and a revolution, but it is not without problems, such as energy efficiency problems, low utilization of GPUs, high acquisition and maintenance costs, etc. In this PhD thesis we aim to analyze the main shortcomings of these heterogeneous systems and propose solutions based on the use of remote GPU virtualization. To that end, we have used the rCUDA middleware, developed at Universitat Politècnica de València. Many publications support rCUDA as the most advanced remote GPU virtualization framework nowadays. The results obtained in this PhD thesis show that the use of rCUDA in Cloud Computing environments increases the degree of freedom of the system, as it allows to create virtual instances of the physical GPUs fully tailored to the needs of each of the virtual machines. In HPC (High Performance Computing) environments, rCUDA also provides a greater degree of flexibility in the use of GPUs throughout the computing cluster, as it allows the CPU part to be completely decoupled from the GPU part of the applications. In addition, GPUs can be on any node in the cluster, regardless of the node on which the CPU part of the application is running. In general, both for Cloud Computing and in the case of HPC, this greater degree of flexibility translates into an up to 2x increase in system-wide throughput while reducing energy consumption by approximately 15%. Finally, we have also developed a job migration mechanism for the GPU part of applications that has been integrated within the rCUDA middleware. This migration mechanism has been evaluated and the results clearly show that, in exchange for a small overhead of about 400 milliseconds in the execution time of the applications, it is a powerful tool with which, again, we can increase productivity and reduce energy foot print of the computing system. In summary, this PhD thesis analyzes the main problems arising from the use of GPUs as computing accelerators, both in HPC and Cloud Computing environments, and demonstrates how thanks to the use of the rCUDA middleware these problems can be addressed. In addition, a powerful GPU job migration mechanism is being developed, which, integrated within the rCUDA framework, becomes a key tool for future job schedulers in heterogeneous clusters.This work jointly supported by the Fundación Séneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants (20524/PDC/18, 20813/PI/18 and 20988/PI/18) and by the Spanish MEC and European Commission FEDER under grants TIN2015-66972-C5-3-R, TIN2016-78799-P and CTQ2017-87974-R (AEI/FEDER, UE). We also thank NVIDIA for hardware donation under GPU Educational Center 2014-2016 and Research Center 2015-2016. The authors thankfully acknowledge the computer resources at CTE-POWER and the technical support provided by Barcelona Supercomputing Center - Centro Nacional de Supercomputación (RES-BCV-2018-3-0008). Furthermore, researchers from Universitat Politècnica de València are supported by the Generalitat Valenciana under Grant PROMETEO/2017/077. Authors are also grateful for the generous support provided by Mellanox Technologies Inc. Prof. Pradipta Purkayastha, from Department of Chemical Sciences, Indian Institute of Science Education and Research (IISER) Kolkata, is acknowledged for kindly providing the initial ligand and DNA structures.Prades Gasulla, J. (2021). Improving Performance and Energy Efficiency of Heterogeneous Systems with rCUDA [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/168081TESISCompendi

    GPU-Job Migration: The rCUDA Case

    Full text link
    © 2019 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] Virtualization techniques have been shown to report benefits to data centers and other computing facilities. In this regard, not only virtual machines allow to reduce the size of the computing infrastructure while increasing overall resource utilization, but also virtualizing individual components of computers may provide significant benefits. This is the case, for instance, for the remote GPU virtualization technique, implemented in several frameworks during the recent years. The large degree of flexibility provided by the remote GPU virtualization technique can be further increased by applying the migration mechanism to it, so that the GPU part of applications can be live-migrated to another GPU elsewhere in the cluster during execution time in a transparent way. In this paper we present the implementation of the migration mechanism within the rCUDA remote GPU virtualization middleware. Furthermore, we present a thorough performance analysis of the implementation of the migration mechanism within rCUDA. To that end, we leverage both synthetic and real production applications as well as three different generations of NVIDIA GPUs. Additionally, two different versions of the InfiniBand interconnect are used in this study. Several use cases are provided in order to show the extraordinary benefits that the GPU-job migration mechanism can report to data centers.This work was funded by the Generalitat Valenciana under Grant PROMETEO/2017/77. Authors are grateful for the generous support provided by Mellanox Technologies Inc.Prades, J.; Silla Jiménez, F. (2019). GPU-Job Migration: The rCUDA Case. IEEE Transactions on Parallel and Distributed Systems. 30(12):2718-2729. https://doi.org/10.1109/TPDS.2019.292443327182729301

    Una antropología en acción para el futuro de Europa

    Get PDF
    A pesar de las atrocidades contra la vida y la dignidad del hombre que han acaecido en Europa en el siglo XX, éste no ha sido destruido, antes bien, ha renacido siempre con fuerza. No obstante, reina actualmente en el viejo continente una gran confusión sobre las características y el significado de la condición humana. El autor, a partir de la categoría bíblica de «imagen de Dios», propone y desarrolla una «antropología en acción» que contribuya a iluminar la verdadera estatura de la persona humana. Desde este marco antropológico, presenta también algunas implicaciones para la educación en la vida social europea.Despite the atrocities against life and dignity of man in Europe during the 20th century, man has not been destroyed, rather, he has always been reborn strongly. Nevertheless, at present reigns in the old continent a great confusion about the characteristics and meaning of the human condition. The author, starting from the biblical category of «image of God», proposes and develops an «anthropology in action» that helps to illuminate the real features of human person. From this anthropologic framework, he also suggests some implications for education in the european social life

    Instalación, configuración y evaluación de la red de interconexión EXTOLL en un entorno de memoria distribuida

    Full text link
    EXTOLL es una nueva arquitectura de red de interconexión, desarrollada por la Universidad de Heidelberg, que pretende establecerse en el sector del HPCC como una alternativa altamente eficiente y económica. EXTOLL proporciona mecanismos de comunicación eficaces tanto para mensajes de pequeño tamaño como para la transmisión de grandes cantidades de datos mediante mecanismos de Remote Direct Memory Access (RDMA). Cabe destacar que el diseño de EXTOLL, actualmente en fase de desarrollo, está implementado en una Field Programmable Gate Array (FPGA), lo que permite realizar modificaciones pero resta prestaciones. Se espera que, para abaratar costes y aumentar prestaciones, el diseño completo acabe integrándose en un chip Application-Specific Integrated Circuit (ASIC) en un futuro cercano. Los resultados obtenidos muestran que el rendimiento usando EXTOLL es superior al de Gigabit Ethernet, obteniendo un speedup medio de 4x. Para finalizar la comparación de prestaciones, hemos adaptado una rutina paralela, capaz de generar imágenes pertenecientes al conjunto fractal de Mandelbrot, para poder apreciar el diferente nivel de prestaciones existente entre EXTOLL y Gigabit Ethernet en tiempo real. Esto es así porque la modificación realizada en esta aplicación genera muchas comunicaciones por tanto las diferencias de rendimiento de los sistemas de interconexión se ven reflejados en la velocidad de generación de las imágenes.Prades Gasulla, J. (2014). Instalación, configuración y evaluación de la red de interconexión EXTOLL en un entorno de memoria distribuida. http://hdl.handle.net/10251/39652.Archivo delegad

    On the Effect of using rCUDA to Provide CUDA Acceleration to Xen Virtual Machines

    Full text link
    [EN] Nowadays, many data centers use virtual machines (VMs) in order to achieve a more efficient use of hardware resources. The use of VMs provides a reduction in equipment and maintenance expenses as well as a lower electricity consumption. Nevertheless, current virtualization solutions, such as Xen, do not easily provide graphics processing units (GPUs) to applications running in the virtualized domain with the flexibility usually required in data centers (i.e., managing virtual GPU instances and concurrently sharing them among several VMs). Therefore, the execution of GPU-accelerated applications within VMs is hindered by this lack of flexibility. In this regard, remote GPU virtualization solutions may address this concern. In this paper we analyze the use of the remote GPU virtualization mechanism to accelerate scientific applications running inside Xen VMs. We conduct our study with six different applications, namely CUDA-MEME, CUDASW++, GPU-BLAST, LAMMPS, a triangle count application, referred to as TRICO, and a synthetic benchmark used to emulate different application behaviors. Our experiments show that the use of remote GPU virtualization is a feasible approach to address the current concerns of sharing GPUs among several VMs, featuring a very low overhead if an InfiniBand fabric is already present in the cluster.This work was funded by the Generalitat Valenciana under Grant PROMETEO/2017/077. Authors are also grateful for the generous support provided by Mellanox Technologies Inc.Prades, J.; Reaño González, C.; Silla Jiménez, F. (2019). On the Effect of using rCUDA to Provide CUDA Acceleration to Xen Virtual Machines. Cluster Computing. 22(1):185-204. https://doi.org/10.1007/s10586-018-2845-0185204221Kernel-Based Virtual Machine, KVM. http://www.linux-kvm.org (2015). Accessed 19 Oct 2015Xen Project. http://www.xenproject.org/ (2015). Accessed 19 Oct 2015VMware Virtualization. http://www.vmware.com/ (2015). Accessed 19 Oct 2015Oracle VM VirtualBox. http://www.virtualbox.org/ (2015). Accessed 19 Oct 2015Semnanian, A., Pham, J., Englert, B., Wu, X.: Virtualization technology and its impact on computer hardware architecture. In: Proceedings of the Information Technology: New Generations, ITNG, pp. 719–724 (2011)Felter, W., Ferreira, A., Rajamony, R., Rubio, J.: An updated performance comparison of virtual machines and linux containers. In: IBM Research Report (2014)Zhang, J., Lu, X., Arnold, M., Panda, D.: MVAPICH2 over OpenStack with SR-IOV: an efficient approach to build HPC Clouds. In: Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid, pp. 71–80 (2015)Wu, H., Diamos, G., Sheard, T., Aref, M., Baxter, S., Garland, M., Yalamanchili, S.: Red Fox: an execution environment for relational query processing on GPUs. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO (2014)Playne, D.P., Hawick, K.A.: Data parallel three-dimensional Cahn-Hilliard field equation simulation on GPUs with CUDA. In: Proceedings of the Parallel and Distributed Processing Techniques and Applications, PDPTA, pp. 104–110 (2009)Yamazaki, I., Dong, T., Solcà, R., Tomov, S., Dongarra, J., Schulthess, T.: Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems. Concurr. Comput.: Pract. Exp. 26(16), 2652–2666 (2014)Luo, D.Y.: Canny edge detection on NVIDIA CUDA. In: Proceedings of the Computer Vision and Pattern Recognition Workshops, CVPR Workshops, pp. 1–8 (2008)Surkov, V.: Parallel option pricing with Fourier space time-stepping method on graphics processing units. Parallel Comput. 36(7), 372–380 (2010)Agarwal, P.K., Hampton, S., Poznanovic, J., Ramanthan, A., Alam, S.R., Crozier, P.S.: Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures. Concurr. Comput.: Pract. Exp. 25(10), 1356–1375 (2013)Luo, G.H., Huang, S.K., Chang, Y.S., Yuan, S.M.: A parallel bees algorithm implementation on GPU. J. Syst. Arch. 60(3), 271–279 (2014)NVIDIA GRID Technology. http://www.nvidia.com/object/grid-technology.html (2015). Accessed 19 Oct 2015Song, J., et al: KVMGT: a full GPU virtualization solution. In: KVM Forum (2014)AMD Multiuser GPU, Hardware-Based Virtualized Solution. http://www.amd.com/Documents/Multiuser-GPU-Datasheet.pdf (2015). Accessed 19 Oct 2015V-GPU: GPU Virtualization. https://github.com/zillians/platform_manifest_vgpu (2015). Accessed 19 Oct 2015Oikawa, M., Kawai, A., Nomura, K., Yasuoka, K., Yoshikawa, K., Narumi, T.: DS-CUDA: a middleware to use many GPUs in the cloud environment. In: Proceedings of the SC Companion: High Performance Computing, Networking Storage and Analysis, SCC, pp. 1207–1214 (2012)Reaño, C., Silla, F., Shainer, G., Schultz, S.: Local and remote GPUs perform similar with EDR 100G InfiniBand. In: Proceedings of the Industrial Track of the 16th International Middleware Conference, ACM, Middleware Industry ’15, pp. 4:1–4:7 (2015)Reaño, C., Silla, F., Duato, J.: Enhancing the rCUDA remote GPU virtualization framework: from a prototype to a production solution. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, IEEE Press, CCGrid ’17, pp. 695–698 (2017)Shi, L., Chen, H., Sun, J.: vCUDA: GPU accelerated high performance computing in virtual machines. In: Proceedings of the IEEE Parallel and Distributed Processing Symposium, IPDPS, pp. 1–11 (2009)Liang, T.Y., Chang, Y.W.: GridCuda: A grid-enabled CUDA programming toolkit. In: Proceedings of the IEEE Advanced Information Networking and Applications Workshops, WAINA, pp. 141–146 (2011)Giunta, G., Montella, R., Agrillo, G., Coviello, G.: A GPGPU transparent virtualization component for high performance computing clouds. In: Proceedings of the Euro-Par Parallel Processing, Euro-Par, pp. 379–391 (2010)Gupta, V., Gavrilovska, A., Schwan, K., Kharche, H., Tolia, N., Talwar, V., Ranganathan, P. GViM: GPU-accelerated virtual machines. In: Proceedings of the ACM Workshop on System-level Virtualization for High Performance Computing, HPCVirt, pp. 17–24 (2009)Merritt, A.M., Gupta, V., Verma, A., Gavrilovska, A., Schwan, K.: Shadowfax: scaling in heterogeneous cluster systems via GPGPU assemblies. In: Proceedings of the International Workshop on Virtualization Technologies in Distributed Computing, VTDC, pp. 3–10 (2011)Shadowfax II—Scalable Implementation of GPGPU Assemblies. http://keeneland.gatech.edu/software/keeneland/kidron (2015). Accessed 19 Oct 2015Walters, J.P., Younge, A.J., Kang, D.I., Yao, K.T., Kang, M., Crago, S.P., Fox, G.C.: GPU-passthrough performance: a comparison of KVM, Xen, VMWare ESXi, and LXC for CUDA and OpenCL applications. In: Proceedings of the IEEE International Conference on Cloud Computing, CLOUD (2014)Yang, C.T., Wang, H.Y., Ou, W.S., Liu, Y.T., Hsu, C.H.: On implementation of GPU virtualization using PCI pass-through. In: Proceedings of the IEEE Cloud Computing Technology and Science, CloudCom, pp. 711–716 (2012)Jo, H., Jeong, J., Lee, M., Choi, D.H.: Exploiting GPUs in virtual machine for BioCloud. BioMed Res. Int. 2013, 11 (2013). https://doi.org/10.1155/2013/939460NVIDIA: CUDA C Programming Guide 7.5. http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf (2015a). Accessed 19 Oct 2015NVIDIA: CUDA Runtime API Reference Manual 7.5. http://docs.nvidia.com/cuda/pdf/CUDA_Runtime_API.pdf (2015b). Accessed 19 Oct 2015NVIDIA: The NVIDIA GPU Computing SDK Version 5.5 (2013)iperf3: A TCP, UDP, and SCTP Network Bandwidth Measurement Tool. https://github.com/esnet/iperf (2015). Accessed 19 Oct 2015Reaño, C., Silla, F.: Reducing the performance gap of remote GPU virtualization with InfiniBand Connect-IB. In: 2016 IEEE Symposium on Computers and Communication (ISCC), pp. 920–925 (2016)Mellanox: Connect-IB Single and Dual QSFP+ Port PCI Express Gen3 x16 Adapter Card User Manual. http://www.mellanox.com/related-docs/user_manuals/Connect-IB_Single_and_Dual_QSFP+_Port_PCI_Express_Gen3_%20x16_Adapter_Card_User_Manual.pdf (2014a). Accessed 19 Oct 2015Mellanox: ConnectX-3 VPI Single and Dual QSFP+ Port Adapter Card User Manual 1.7. http://www.mellanox.com/related-docs/user_manuals/ConnectX-3_VPI_Single_and_Dual_QSFP_Port_Adapter_Card_User_Manual.pdf (2013). Accessed 19 Oct 2015Pérez, F., Reaño, C., Silla, F.: Providing CUDA acceleration to KVM virtual machines in InfiniBand clusters with rCUDA. In: 16th International Conference Distributed Applications and Interoperable Systems (DAIS), pp. 82–95. Springer International Publishing (2016)Mellanox: Mellanox OFED for Linux User Manual. http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_User_Manual_v2.3-1.0.1.pdf (2014b). Accessed 19 Oct 2015Reaño, C., Mayo, R., Quintana-Ortí, E., Silla, F., Duato, J., Peña, A.: Influence of InfiniBand FDR on the performance of remote GPU virtualization. In: Proceedings of the IEEE International Conference on Cluster Computing, CLUSTER, pp. 1–8 (2013)Laboratories, S.N.: LAMMPS Molecular Dynamics Simulator. http://lammps.sandia.gov/ (2013). Accessed 19 Oct 2015Liu, Y., Schmidt, B., Liu, W., Maskell, D.L.: CUDA-MEME: accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units. Pattern Recognit. Lett. 31(14), 2170–2177 (2010)Liu, Y., Wirawan, A., Schmidt, B.: CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformat. 14(1), 1–10 (2013)Vouzis, P.D., Sahinidis, N.V.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)NVIDIA: NVIDIA Popular GPU-Accelerated Applications Catalog. http://www.nvidia.com/content/gpu-applications/PDF/GPU-apps-catalog-mar2015.pdf (2015c). Accessed 19 Oct 2015Liu, Y. CUDA-MEME. https://sites.google.com/site/yongchaosoftware/mcuda-meme (2014). Accessed 19 Oct 2015Polak, A.: Counting triangles in large graphs on GPU. In: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 740–746 (2016)Prades, J., Silla, F.: Turning GPUs into floating devices over the cluster: the Beauty of GPU Migration. In: Proceedings of the 6th Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications (HUCAA) (2017
    corecore